An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

نویسندگان

چکیده مقاله:

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sentence in these corpora involves a series of relations introducing each word in the sentence as a dependent of another, referred to as the head. Examination of head-dependent pairs extracted from similar contexts in the above treebanks reveals frequent, apparently systematic inconsistencies observed particularly in the cases of nominal and adjectival heads. This can be explained in terms of the failure to postulate valency structures for nouns and adjectives as well as for verbs, taking for granted that tokens receive the proper labels regardless of such an assumption. When the notion of valency was borrowed from chemistry to refer to the number of controlled arguments, it was meant to apply only to verbs. Later developments of dependency grammar included the proposal of nominal and adjectival valency as well. The significance of the idea seems to have been underestimated, though. It has been highly improbable, therefore, for developers of dependency treebanks to design their annotation schemes otherwise. As far as Persian is concerned, Uppsala Persian Dependency Treebank and Dependency Persian Treebank (DepPerTreeBank, the dependency version of PerTreeBank) have used the Stanford Typed Dependencies. The later version of the former treebank, Persian Universal Dependency Treebank, has used the Universal Dependencies. These are standard annotation schemes that do not recognize valency for nouns and adjectives. Furthermore, Persian Syntactic Dependency Treebank has used its own set of dependency relations, where little attention has been paid to the idea. This paper reported the design process of a scheme for annotation of Persian dependency structure as part of an ongoing project of developing a dependency treebank for Persian. The scheme was based on a comprehensive description of Persian syntax according to a theory introduced as the Autonomous Phrases Theory. The main idea is that the significance of phrases should be appreciated in dependency analyses due to their cognitive reality, and the notion of valency is also extended beyond verbs, on which basis every dependent of whatever head type is classified as either a complement or an adjunct. Moreover, to make the resulting annotation scheme reasonably intelligible to the target audience, the latest standard available annotation scheme, Universal Dependencies (UD), was adapted to suit the requirements of the adopted framework. The outcome was a tag set of fifty-three dependency relations, including fifteen original labels and the rest borrowed from the universal dependencies. Although it provides more detailed annotation than UD does by making finer distinctions, our scheme does not involve too many tags more than UD does, mainly because a large number of the additional relations are shared by two or three head types.  

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universal Dependencies for Persian

The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UP...

متن کامل

An Annotation Scheme for a Persian Treebank

In this paper we present and justify methodological principles and syntactic criteria to design an annotation scheme for a Persian Treebank. The main approaches to the annotation of Treebanks are presented in order to account for taken decisions. After examining these approaches, and taking into account the syntactic characteristics of Persian, the most appropriate one will be selected and its ...

متن کامل

translation of collocations from english into persian, based on ghazalas theory

غزالا همایندها را به صورت ترکیبی از دو یا چند واژه که همواره در متون مختلف زبان ها همراه با هم می آیند تعریف می نماید. از دیدگاه او روند رو به رشد میل به ترجمه ی همایندها در مطالعات ترجمه، به دلیل اهمیت آنها در انسجام ساختار زبان است. این پایان نامه اساسا به ترجمه ی همایندها منحصر شده است. هدف آن بررسی کاربرد راهکارهای غزالا در مورد ترجمه ی همایندها از انگلیسی به فارسی است. هدف دیگر آن یافتن پر...

15 صفحه اول

Multi-word annotation in syntactic treebanks Propositions for Universal Dependencies

This paper discusses how to analyze syntactically irregular expressions in a syntactic treebank. We distinguish such Multi-Word Expressions (MWEs) from comparable non-compositional expressions, i.e. idioms. A solution is proposed in the framework of Universal Dependencies (UD). We further discuss the case of functional MWEs, which are particularly problematic in UD.

متن کامل

Assessing the Annotation Consistency of the Universal Dependencies Corpora

A fundamental issue in annotation efforts is to ensure that the same phenomena within and across corpora are annotated consistently. To date, there has not been a clear and obvious way to ensure annotation consistency of dependency corpora. Here, we revisit the method of Boyd et al. (2008) to flag inconsistencies in dependency corpora, and evaluate it on three languages with varying degrees of ...

متن کامل

A hedging annotation scheme focused on epistemic phrases for informal language

Most existing annotation schemes for hedging were created to aid in the automatic identification of hedges in formal language styles, such as used in scholarly prose. Language with informal tone, typical in much web content, poses a challenge and provides illuminating case studies for the analysis of the use of hedges. We have analysed conversations from a web forum and identified the manners i...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 16  شماره 3

صفحات  60- 49

تاریخ انتشار 2019-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023